Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis
نویسندگان
چکیده
BACKGROUND Biomedical semantic indexing is a very useful support tool for human curators in their efforts for indexing and cataloging the biomedical literature. OBJECTIVE The aim of this study was to describe a system to automatically assign Medical Subject Headings (MeSH) to biomedical articles from MEDLINE. METHODS Our approach relies on the assumption that similar documents should be classified by similar MeSH terms. Although previous work has already exploited the document similarity by using a k-nearest neighbors algorithm, we represent documents as document vectors by search engine indexing and then compute the similarity between documents using cosine similarity. Once the most similar documents for a given input document are retrieved, we rank their MeSH terms to choose the most suitable set for the input document. To do this, we define a scoring function that takes into account the frequency of the term into the set of retrieved documents and the similarity between the input document and each retrieved document. In addition, we implement guidelines proposed by human curators to annotate MEDLINE articles; in particular, the heuristic that says if 3 MeSH terms are proposed to classify an article and they share the same ancestor, they should be replaced by this ancestor. The representation of the MeSH thesaurus as a graph database allows us to employ graph search algorithms to quickly and easily capture hierarchical relationships such as the lowest common ancestor between terms. RESULTS Our experiments show promising results with an F1 of 69% on the test dataset. CONCLUSIONS To the best of our knowledge, this is the first work that combines search and graph database technologies for the task of biomedical semantic indexing. Due to its horizontal scalability, ElasticSearch becomes a real solution to index large collections of documents (such as the bibliographic database MEDLINE). Moreover, the use of graph search algorithms for accessing MeSH information could provide a support tool for cataloging MEDLINE abstracts in real time.
منابع مشابه
Harmonizing bioCADDIE Metadata Schemas for Indexing Clinical Research Datasets Using Semantic Web Technologies
An important role of the NIH Big Data to Knowledge (BD2K) biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) is to promote data integration through the adoption of content standards and alignment to common data elements and high-level schema. The objective of this study was to investigate how a combination of Semantic Web technologies and the ISO/IEC 11179 data element model c...
متن کاملDeep Learning Based Semantic Video Indexing and Retrieval
We share the implementation details and testing results for video retrieval system based exclusively on features extracted by convolutional neural networks. We show that deep learned features might serve as universal signature for semantic content of video useful in many search and retrieval tasks. We further show that graph-based storage structure for video index allows to efficiently retrievi...
متن کاملClassification and Analysis of Frequent Subgraphs Mining Algorithms
In recent years, data mining in graphs or graph mining have attracted much attention due to explosive growth in generating graph databases. The graph database is one type of database that consists of either a single large graph or a number of relatively small graphs. Some applications that produce graph database are as follows: Biological networks, semantic web and behavioral modeling. Among al...
متن کاملInteractive Platform for Semantic Gene Expression Analysis of Alzheimer's Disease
In the course of gene expression analysis, it is required to interpret data by referencing knowledge bases of genetics, pathways, diseases and drugs. However, because those external resources are often stored in distributed databases in various formats, it is hard for biomedical scientists to use them in combination. Semantic Web technologies are suitable for integration of those heterogeneous ...
متن کاملRDF Triple Stores and a Custom SPARQL Front-End for Indexing and Searching (Very) Large Semantic Networks
With growing interest in the creation and search of linguistic annotations that form general graphs (in contrast to formally simpler, rooted trees), there also is an increased need for infrastructures that support the exploration of such representations, for example logical-form meaning representations or semantic dependency graphs. In this work, we lean heavily on semantic technologies and in ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 5 شماره
صفحات -
تاریخ انتشار 2017